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Abstract 

Background: The rpoB-psbZ (BZ) region of some fern plastid genomes (plastomes) has been noted to go through 
considerable genomic changes. Unraveling its evolutionary dynamics across all fern lineages will lead to clarify the 
fundamental process shaping fern plastome structure and organization. 

Results: A total of 24 fern BZ sequences were investigated with taxon sampling covering all the extant fern orders. 
We found that: (i) a tree fern Plogiogyrio joponico contained a novel gene order that can be generated from either 
the ancestral Angiopteris type or the derived Adiontum type via a single inversion; (ii) the trnY-trnE intergenic spacer 
(IGS) of the filmy fern Vondenboschio rodicons was expanded 3-fold due to the tandem 27-bp repeats which 
showed strong sequence similarity with the anticodon domain of trnY; (iii) the trnY-trnE IGSs of two horsetail ferns 
Equisetum ramosissimum and E. orvense underwent an unprecedented 5-kb long expansion, more than a quarter of 
which was consisted of a single type of direct repeats also relevant to the trnY anticodon domain; and (iv) ycf66 
has independently lost at least four times in ferns. 

Conclusions: Our results provided fresh insights into the evolutionary process of fern BZ regions. The intermediate BZ 
gene order was not detected, supporting that the Adiontum type was generated by two inversions occurring in pairs. 
The occurrence of Vondenboschio 27-bp repeats represents the first evidence of partial tRNA gene duplication in fern 
plastomes. Repeats potentially forming a stem-loop structure play major roles in the expansion of the trnY-trnE IGS. 



Background 

In contrast to nuclear and mitochondrial genomes, plant 
plastid (chloroplast) genomes (plastomes) are generally 
conserved in genome size, gene content and gene order 
[1-3]. This high conservation makes the plastid genes and 
genomes quite amenable for sequencing and be widely 
used in evolutionary and phylogenetic studies. Neverthe- 
less, comparative genomics studies demonstrate that the 
plastomes of several vascular plant lineages such as lyco- 
phytes (Selaginellaceae) [4,5], gymnosperms (e.g. Pinaceae 
[6-8], Cupressaceae [9], Welwitschiaceae [7,10], Gnetaceae 
and Ephedraceae [7]) and various eudicot angiosperm 
lineages (e.g. Geraniaceae [2,11], Campanulaceae [12,13] 
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and Fabaceae [14,15]), have experienced remarkable geno- 
mic changes including significant size variations, complex 
rearrangements as well as substantial gene losses. Many 
reports have shown that highly rearranged plastomes 
usually contain a large number of repetitive elements 
[2,11,12,16]. Furthermore, the distribution of the repeats 
also exhibits a tendency to flank the rearrangement end- 
points, implying an association between the repeat and the 
rearrangement [2,9,11,12,16-18]. Recently, Marechal and 
Brisson [19] specified that the suppression of recombina- 
tion between repeats is of importance in the maintenance 
of plastome stability. Nevertheless, besides rearrangement 
endpoints, abundant repeats are also found in other 
regions of plastomes. For instance, extensive dispersed 
repeats have been found throughout the algae plastome of 
Chlamydomonas reinhardtii [20], and many direct repeats 
derived from partial duplication of their nearby trnY-GUA 
gene have been observed in Douglas-fir (Pseudotsuga men- 
ziesii) [21]. These findings highlight the structural and 
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functional significances of chloroplast DNA (cpDNA) 
repeats. In Chlamydomonas plastomes, it has been shown 
that small dispersed repeats can influence both transcript 
stability and translation efficiency [22] or even function in 
DNA repair [23]. Previous studies, particularly those on 
the complete plastome sequences, have well documented 
the characteristics and distribution of cpDNA repeats 
[2,9,11,12,16,20,24,25]. However, very few investigations 
deal with the implications of the secondary structure of 
cpDNA repetitive elements on their origin, proliferation 
and potential function [26]. Delineating the secondary 
structural features should greatly facilitate our understand- 
ing of plastome evolution. 

A number of comparative chloroplast genomic studies 
have uncovered structural mutations in fern (monilophyte) 
cpDNAs, including as many as 6 inversions and a few gene 
losses [24,27-32]. Specifically, one -3.3 kb inversion (invol- 
ving trnG-GCC to trnT-GGU) [27] and an inverted trnD- 
GUC gene (D inversion) [24] have been detected across 
ferns relative to other land plants. According to gene 
orders, the fern plastomes can be classified into two main 
types. One comprises the plastomes of taxa diversifying 
before the separation of the Schizaeales, which share the 
ancestral gene order and has been assumed to undergo no 
major rearrangements [33]. By contrast, the other com- 
poses the plastomes of core leptosporangiates possessing 
the derived gene order [33]. This derived gene order is 
characteristic of highly rearranged inverted repeats (IRs) 
with the rRNA genes arranged in reverse order in compar- 
ison to all other plants [34]. The rearranged IRs and their 
adjacent section of large single copy (LSC) region are 
thought to be generated by two partially overlapping 
inversions spanning LSC and IR regions [35]. Wolf et al. 
[33] recently illustrated that the two putative inversions 
occurred in pairs on the branch leading to the common 
ancestor of schizaeoid and core leptosporangiate ferns. 

The next striking difference between the ancestral and 
derived gene order is occurred between the rpoB and psbZ 
(BZ) in LSC region (Figure la). BZ region is characterized 
with a high degree of variability. Each of the three key 
inversions shaping the ancestral gene order of ferns, i.e. the 
30-kb inversion [36], the 3.3-kb inversion [27] and the D 
inversion [24] , have at least one of their endpoints located 
within BZ region. Notably, up to five tRNA genes are con- 
centrated in this small region after the three inversions 
(Figure la). This uncommonly high frequency of tRNA 
genes may be relative with the instability of BZ region. 
Roper et al. [28] suggested that the gene order changes 
within BZ region (hereafter the BZ rearrangement) of ferns 
can also be derived from two partially overlapping inver- 
sions by either of the two potential pathways (Figure la). 
Nonetheless, since all the investigated core leptosporangi- 
ates possess the derived BZ order (the same as Adiantum 
type gene order) (Figure la) and no intermediate has been 



identified in any ferns, it has been argued that the two 
hypothetical inversions should take place in pairs in the 
common ancestor of core leptosporangiates [33]. Unfortu- 
nately, the previous studies have only examined four com- 
plete (3 polypods and 1 tree fern) [24,27,30,32] and six 
partial plastome sequences from the leptosporangiates [33]. 
If more samples are examined, the putative intermediates 
may be uncovered. 

In this study, we mainly investigated the evolutionary 
process of BZ region and its sequence components in 
ferns. Twenty-four fern BZ sequences were studied guided 
by the recently published phylogenetic framework [37], 
with a focus on leptosporangiates. Firstly, a novel gene 
order was detected in the tree fern Plagiogyria japonica, 
which may represent the intermediate of BZ rearrange- 
ment or the reverse mutant of the Adiantum type. Sec- 
ondly, a unique 459-bp region, consisting of 17 tandem 
27-bp repeats derived from the partial duplication of the 
adjacent trnY gene, was found to cause the trnE-trnY 
intergenic spacer (IGS) of the filmy fern Vandenboschia 
radicans to expand approximately 3 -fold in length. To our 
knowledge this is the first report of partially duplicated 
tRNA gene in fern plastomes. Thirdly, unexpected 5-kb 
long trnE-trnY IGSs were observed in two horsetail ferns 
Equisetum ramosissimum and E. arvense. More than a 
quarter of the IGSs was comprised of a single type of 
direct repeats possessing the potential to form a highly 
conserved stem-loop structure. The direct repeats may 
have a recent evolutionary origin, frequently conduct copy 
corrections, and are of significant functional relevance. 
And fourthly, the occurrence of ycf66 was confirmed 
highly unstable in ferns with at least 4 times of indepen- 
dent losses. 

Methods 

DNA amplification and sequencing 

Up to date, seven complete plastome sequences of ferns 
have been deposited in GenBank, whose data can be 
directly extracted. Besides these, additional 17 sampling 
taxa were chosen based on the previously published phylo- 
genetic framework of extant ferns [37] to represent all 
major lineages at the order level (Table 1). Young leaves of 
the 17 fern species were collected from Wuhan Botanical 
Garden, Chinese Academy of Sciences (CAS), South 
China Botanical Garden, CAS, and Shenzhen Fairy Lake 
Botanical Garden. Voucher specimens were deposited at 
the herbarium of Wuhan Botanical Garden, CAS. Total 
DNA isolation, primer design, polymerase chain reaction 
(PCR) and DNA sequencing were as previously described 
[24]. 

To obtain the sequences from rpoB to psbZ, the con- 
served flanking regions, partial sequence of rpoB gene 
and GpsbC (psbC to trnG) block (Figure la) were ampli- 
fied, cloned into plasmid vectors (pCR2.1, Invitrogen, 
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Figure 1 Schematic diagrams of the fern plastid gene orders from psbC to rpoB (a) and sequencing strategies (b). Each colored gene 
segment shows the same gene order region among the published fern plastomes. The gene orders of "Putative intermediate A" and "Putative 
intermediate B" are according to Roper et al. [28]. 



Carlsbad, CA) and transformed into E. coli DH5a. At 
least three clones for each PCR product were randomly 
selected and commercially sequenced from both ends 
using ABI 3730x1 DNA Analyzer (Applied Biosystems). 
Species-specific primers were then designed based on 
the flanking sequences and long-range PCR was per- 
formed to amplify rpoB-psbZ region (Figure lb, Strategy 
I). The desired band was gel-purified, sequenced from 
both ends, and then determined the remains by primer 
walking. To avoid the potential error from PCR and 
sequencing, each PCR fragment was independently 
sequenced twice. If they had differences, additional 
sequencings were performed. 

For some samples, whose BZ sequences were unable 
to be completely acquired by primer walking sequencing 
of PCR products because of repeats and/or complex sec- 
ondary structures, a two-step approach was applied 



(Figure lb, Strategy II): first, the regions of CNM (trnC- 
petN-psbM) and DYE (trnD-trnY-trnE) gene blocks were 
amplified, cloned and sequenced; second, species-speci- 
fic primers were designed based on the CNM and DYE 
sequences coupled with the primers from the rpoB gene 
and GpsbC region to amplify the remained sections. At 
least three clones for each PCR product were sequenced. 
The overlapping regions of each pair of adjacent PCR 
fragments exceeded 150 bp. 

The sequences generated in this paper have been 
deposited in GenBank (accession numbers: HQ658094- 
HQ658110) (Table 1). 

Sequence assembly and annotation 

The individual reads were cleaned by removing vector, 
primer and low-quality sequences, then assembled using 
CAP [38] through BioEdit [39]. The assembled sequences 
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Table 1 List of taxa and sequences analyzed in this study 



Taxon 


Family 3 


Order 3 


Collection information 15 


GenBank 


Citation 


Adiantum capillus-veneris 


Pteridaceae 


Polypodiales 




NC_004766 


Wolf et al. 2003 [27] 


Asplenium australasicum 


Aspleniaceae 


Polypodiales 


WBG 


HQ658095 


This study 


Cheilanthes lindheimeri 


Pteridaceae 


Polypodiales 




NC_0 14592 


Wolf et al. 2010 [30] 


Platycerium bifufcatum 


Polypodiaceae 


Polypodiales 


WBG 


HQ658094 


This study 


Pteridium aquilinum 


Dennstaedtiaceae 


Polypodiales 




NC_0 14348 


Der 2010 [32] 


Alsophila spinuloso 


Cyatheaceae 


Cyatheales 




NC 012818 


Gao et al. 2009 [24] 


Plagiogyria japonico 


Plagiogyriaceae 


Cyatheales 


SZBG 


HQ658099 


This study 


Azolla cafoliniana 


Salviniaceae 


Salviniales 


WBG 


HQ658096 


This study 


Ma fs ilea quadrifolia 


Marsileaceae 


Salviniales 


WBG 


HQ658098 


This study 


Salvinia molesto 


Salviniaceae 


Salviniales 


SCBG 


HQ658097 


This study 


Lygodium microphyllum 


Lygodiaceae 


Schizaeales 


SZBG 


HQ658100 


This study 


Dicranopteris linearis 


Gleicheniaceae 


Gleicheniales 


SZBG 


HQ658102 


This study 


Diploptefygium chinensis 


Gleicheniaceae 


Gleicheniales 


SZBG 


HQ658103 


This study 


Dipteris chinensis 


Dipteridaceae 


Gleicheniales 


WBG 


HQ658101 


This study 


Vondenboschio rodicons 


Hymenophyllaceae 


Hymenophyllales 


WBG 


HQ658104 


This study 


Osmunda vachellii 


Osmundaceae 


Osmundales 


WBG 


HQ658105 


This study 


Angiopteris evecto 


Marattiaceae 


Marattiales 




NC_008829 


Roper et al. 2007 [28] 


Botrychium strict urn 


Ophioglossaceae 


Ophioglossales 


WBG 


HQ658108 


This study 


Helminthostachys zeylonico 


Ophioglossaceae 


Ophioglossales 


SCBG 


HQ658107 


This study 


Ophioglossum vulgatum 


Ophioglossaceae 


Ophioglossales 


SZBG 


HQ658106 


This study 


Psilotum nudum 


Psilotaceae 


Psilotales 




NC_003386 


Wakasugi et al. 1998 [ 


Equisetum arvense 1 


Equisetaceae 


Equisetales 


SCBG 


HQ658110 


This study 


Equisetum orvense 2 


Equisetaceae 


Equisetales 




GU191334 


Karol et al. 2010 [31] 


Equisetum romosissimum 


Equisetaceae 


Equisetales 


SCBG 


HQ658109 


This study 



a - the nomenclature follow Smith et al. [37]. 

b -WBG, Wuhan Botanical Garden, CAS; SCBG, South China Botanical Garden, CAS; SZBG, Shenzhen Fairy Lake Botanical Garden. All voucher specimens were 
deposited at the herbarium of Wuhan Botanical Garden, CAS. 



were annotated by DOGMA (Dual Organellar GenoMe 
Annotator) [40]. Start and stop codons were defined 
through comparison to published complete plastome 
sequences available in GenBank. To detect tRNA genes, 
two online programs were employed, ARAGORN vl.2 
[41] and tRNAscan-SE v.1.21 [42], The putative promo- 
ters were identified by running BPROM [43]. 

Repeat sequence analyses 

The sequences were initially scanned with REPuter [44] at 
a repeat length > 20 bp with a Hamming distance of 3. For- 
ward (direct), reverse, complement and reverse comple- 
ment repeats were all recognized under REPuter. Repeated 
sequences were unusually abundant in E. ramosissimum 
and E. arvense. For them, repeats were further identified 
and classified by the VMATCH software package [45]. For 
each sequence, an index was constructed using MKVTREE 
program with the -dna -pi -allout and -v options. Direct 
repeats > 20 bp were identified using VMATCH and then 
divided into distinct families with MATCHCLUSTER by 
allowing 15% sequence dissimilarity (-erate option set to 
15). The sequences of each family were extracted with 
VMATCHSELECT. Like REPuter, the VMATCH identifies 
all overlapping repeats and thus overestimates the number 



of repetitive elements in a given sequence. To avoid this 
issue, the redundant overlapping repeats were masked. The 
consensus for each family was then generated from a 
CLUSTAL X [46] alignment. 

The secondary structures of repeated sequences were 
predicted by Mfold web server [47] with default para- 
meters. Most of the repeats found in horsetails have a 
stem-loop structure with a 7-nt loop. Then, we designed 
a Perl script (available on request) to detect the 
sequence fragments which have the following stem-loop 
structure characteristics: loop length = 7 and stem 
length >3. The identified stem-loop sequences were 
assigned to distinct families according to their stem 
sequences afterwards. 

Phylogenetic analyses 

A total of 5 protein-coding (petN, psbC, psbM, psbZ, 
rpoB) and 6 tRNA gene (trnC-GCA, trnD-GUC, trnE- 
UUC, trnG-GCC, trnS-UGA, trnY-GUA) sequences were 
extracted from 17 new generated fern plastid sequences 
from psbC to rpoB in this study (Figure 1). The coding 
sequences of these 11 genes were also acquired from 
the completed plastomes of 6 ferns, i.e. Adiantum capil- 
lus-veneris, Alsophila spinulosa, Angiopteris evecta, 
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Cheilanthes UndheimerU Psilotum nudum and Pteridium 
aquilinum, as well as 2 seed plant outgroups, i.e. 
Amborella trichopoda (NC_005086) and Cycas taitun- 
gensis (NC_009618), according to their annotations in 
GenBank. The nucleotide sequences of each tRNA gene 
were aligned in MUSCLE [48] with manual inspection. 
For protein-coding genes, nucleotide sequences for each 
gene were translated into amino acids, aligned in MUS- 
CLE [48]. Nucleotide sequences were aligned by con- 
straining them to the amino acid sequence alignment 
followed by manual adjustments. A Nexus file compris- 
ing 5,525 characters was generated after alignment was 
completed. 

Phylogenetic analyses were performed using maximum 
likelihood (ML) (GARLI vl.0.699) [49] and Bayesian 
inference (BI) (MrBayes v3.1.2) [50]. The most appropri- 
ate model (GTR+I + G) of nucleotide evolution was 
determined by using the Akaike Information Criterion 
via Modeltest 3.7 [51]. For ML, three independent runs 
were conducted in GARLI, using default parameters 
except that automated stopping criterion set at 20,000 
generations (genthreshfortopoterm = 20000). A total of 
1,000 ML Bootstrap (BS) replicates was also performed 
using GARLI. Likelihood scores were calculated by 
using PAUP v4.10 [52]. For BI, each run started with a 
random tree, default priors and four Markov chains, and 
were sampled every 100 generations. Three independent 
analyses were run for 1 x 10 7 , 1.5 x 10 7 and 2 x 10 7 
generations. Convergence was confirmed by Tracer 1.5 
[53]. Twenty-five percent of burn-in trees were 
discarded. 

Results and Discussion 

The process of rpoB-psbZ rearrangement 

Two putative pathways have been proposed for describ- 
ing the evolutionary process of the complex gene order 
change between rpoB and psbZ through fern evolution 
(Figure la) [28]. However, no direct evidence is pro- 
vided for either of them. Figure 2 shows the BZ gene 
order in 24 samples representing all the 11 extant fern 
orders (Table 1) [following reference 37]. Two blocks of 
genes, CNM (trnC-petN-psbM) and DYE (trnD-trnY- 
trnE), are found to be conserved across ferns. Nearly all 
core leptosporangiates excluding Plagiogyria japonica 
have the same gene arrangement pattern as that 
observed in Adiantum capillus-veneris [27] (hereafter 
the Adiantum type). By contrast, all basal ferns and 
early branches of leptosporangiates share the gene order 
previously found in Angiopteris evecta [28] (hereafter 
the Angiopteris type). Unlike other core leptosporangi- 
ates, the tree fern P. japonica (Plagiogyriaceae) does not 
present the Adiantum type order. Instead its gene order 
(hereafter the Plagiogyria type) seems to derive from the 



Angiopteris type via a large inversion spanning from 
trnC-GCA to trnE-UUC ("CE inversion" in Figure 2) or 
from the Adiantum type through a small inversion only 
involving the DYE block ("DE inversion" in Figure 2). 

The Plagiogyria type order seemingly represents the 
intermediate of BZ rearrangement. If this hypothesis is 
true, we might speculate that the Adiantum type is formed 
through two serial inversions, first the large CE inversion 
and then the small DE inversion (as shown in Figure 2). 
For the CE inversion, the most parsimonious explanation 
is that it occurred only once and on the common ancestor 
of core leptosporangiates (Figure 3a), because the Adian- 
tum type has been observed in all the three core leptos- 
porangiate lineages. The next question is at which 
evolutionary stage the DE inversion event occurred? 
Recent studies have identified Plagiogyriaceae as a lineage 
of tree ferns [54-61]. Thus, it is reasonable to expect that 
the Adiantum type found in tree ferns directly arose from 
the Plagiogyria type. As for the Adiantum type in other 
core leptosporangiate ferns, intuitively it is also intended 
to infer that this order was derived from the Plagiogyria 
type. However, current knowledge of the phylogenetic 
positions of both Plagiogyriaceae and tree ferns make the 
speculation implausible. Molecular phylogenetic analyses 
have shown that tree ferns are the sister group of poly- 
pods, and then the two groups jointly compose the sister 
group to heterosporous ferns (Figure 2, Additional file 1) 
[56,58,59,61-65]. If it is presumed that the Adiantum type 
observed in heterosporous and polypod ferns originated 
directly from the Plagiogyria type, there should exist 
unknown polypod and heterosporous fern species that 
possess the same intermediate gene order as that of Plagi- 
ogyria. In other words, once the Plagiogyria type is 
hypothesized to be the intermediate form of the BZ rear- 
rangement, the putative DE inversion would have had to 
independently occur at least three times (each in the three 
core leptosporangiate lineages, respectively) to transit the 
Plagiogyria type into the Adiantum type (Figure 3a). 
Therefore, taking the Plagiogyria type as the intermediate 
form actually becomes a very unlikely pathway for estab- 
lishing the derived BZ gene type. 

An alternative interpretation is that the Plagiogyria 
type merely represents a derivative of the Adiantum type 
via a reverse DE inversion (Figure 3b). As shown in figure 
2, the DYE block is quite short, merely -300-500 bp in 
most leptosporangiates. Since it is well recognized that 
the small-scale inversion is highly prone to reversal and 
parallelism [66], and the high degree of rearrangements 
is often associated with tRNA genes [12], here we would 
propose that the occurrence of the reverse DE inversion 
should be of great possibility. If this is indeed the case, 
then the exact process of the alteration of Angiopteris 
type to Adiantum type remains an open question. 
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Figure 2 The gene organization from rpoB to psbZ in analyzed ferns. The arrows correlate with the location, size and transcription direction 
of the corresponding genes. Dashed lines indicate direction of transcription does not change; solid lines mark putative local inversions. The 
complete version of the tree including statistical supports and branch lengths is shown in Additional file 1. *, the plastomes have been 
sequenced. The symbols of ycf66: green, complete gene; grey, pseudogene. Abbreviations: Z, psbZ; G: trnG-GCC; E: fmE-UUC; Y: rniV-GUA; D, trnD- 
GUC; M, psbM; N, petN; C, trnC-GCA; B: rpoB; Equisetum o, Equisetum orvense; Equisetum r, Equisetum romosissimum. 



frnD-GUC inversion 

Three consecutive tRNA genes, trnD-GUC, trnY-GUA 
and trnE-UUC, are embedded in the BZ region. In seed 
plants, they have been shown to constitute an operon 
{trnE operon) whose transcript is processed to produce 
individual tRNA molecules [67]. Nevertheless, in our pre- 
vious report, the trnD gene was found to have an oppo- 
site transcriptional direction relative to trnY and trnE in 



ferns based on the four completely sequenced fern plas- 
tome data available at that time [24]. With the newly 
determined sequences here, our previous speculation 
that the minor D inversion is shared by all fern lineages 
was further corroborated. Since the trnD is inverted, it is 
reasonable to assume that this gene is unable to be co- 
transcripted with trnY and trnE. In addition, the con- 
served "-35 box" and "-10 box" promoter sequences were 
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Figure 3 Two potential explanations for the origin of the Plagiogyria gene order. PI, Plogiogyrio. The minimal numbers of inversion events 
compared to the Angiopteris type are provided for each branch, (a), "CE" and "DE" indicate the putative CE and DE inversions as figure 2, 
respectively. The dashed lines show the hypothetical branches with no experimental evidence, (b), "2 inversions" denotes the two hypothetical 
inversions that converted the Angiopteris type to the Adiontum type [28]. 



also found upstream of the trnD gene in all the studied 
ferns (Additional file 2), further supporting that the tran- 
scription of the inverted trnD gene is independent of the 
trnE operon. 

Intergenic spacers 

Sizes of the sequences between rpoB and psbZ are 
highly variable in ferns, ranging from 2,744 bp in Psilo- 
tum nudum to 7,546 bp in E. ramosissimum. The size 
variability is directly linked to the size of IGS, since 
both gene content and length are highly conserved in 
the BZ region (Figure 2). 
The IGS of trnY-trnE 

The sizes of trnY-trnE IGS (YE-IGS) are largely conser- 
vative in ferns, most of them ranging from 95 to 179 bp 
(Figure 2). The smallest YE-IGS, merely 16 bp, is 
detected in Platy cerium wallichii (a polypod fern). In 
stark contrast, one filmy fern and two horsetails have 
experienced dramatic expansion of this region, reaching 
as long as 619 bp, 4,872 bp and 5,000 bp in Vanden- 
boschia radicans, E. arvense (our sequence, hereafter 
E. arvense 1) and E. ramosissimum, respectively. The 
unusual 5-kb long YE-IGS of E. arvense was also noted 
in the recently published report documenting its com- 
plete plastome sequence [[31], hereafter E. arvense 2]. 
The unexpected large IGS leads us directly to the ques- 
tion of how the region is organized and where its com- 
ponent module originates from. 

As for V. radicans YE-IGS, a total of 17 tandem 27-bp 
quasi-identical repeats were found, falling into three 
modules (Figure 4a). The first contains five 27-bp repeats, 
while the other two each include six 27-bp repeats 
(Figure 4b). Interestingly, the two 6 x 27 modules are 
identical: both are composed of one distantly homolo- 
gous 27-bp head upstream of five nearly identical 27-bp 
segments (there is only a single base pair difference 
among the five repeats) (Figure 4b). We noticed that the 
sequences of the 27-bp repeats resemble a 25-bp section 



of the trnY gene (Figure 4b, c), corresponding to the 
entire anticodon arm and the stem of the D arm. Simi- 
larly, the duplications of this trnY region were also char- 
acterized in Douglas-fir [21]. To our knowledge, this 
partial tRNA gene duplication has not been reported in 
ferns before. Like the trnY anticodon arm, the 27-bp 
repetitive elements also possess the potential to fold a 
similar stem-loop structure. The independent occur- 
rences of the partial trnY duplications in filmy fern as 
well as Douglas-fir imply that the anticodon domain 
sequence of trnY has a tendency to duplicate and prolif- 
erate, possibly relative to its stem-loop secondary 
structure. 

The VMATCH software package was used to identify 
and classify the dispersed repeats in Equisetum. A total 
of 85 (82 direct and 3 palindromic) and 441 (440 direct 
and 1 palindromic) matches > 20 bp were detected in 
the BZ sequences of E. ramosissimum and E. arvense 1, 
respectively. All the direct matches but one from 
E. ramosissimum resides in the YE-IGS. To affirm the 
existence of this large number of repeats in E. arvense, 
the E. arvense 2 plastome sequence was also analyzed by 
using VMATCH. 560 direct and 20 palindromic 
matches were recognized, of which 548 direct matches 
located in the YE-IGS. The YE-IGS thus far becomes 
the most repeat-rich region found in the E. arvense 
plastome. 

After filtering the overlapping repeats, 54 and 84 non- 
redundant direct repeats were identified in the YE-IGS of 
E. ramosissimum and E. arvense 1, respectively. Based on 
sequence similarity, the repeats fell into 16-18 families 
(Table 2). Their secondary structures were then predicted 
by using Mfold web server [47] (Additional file 3-4). 
Remarkably, most of the repeats, 45 out of 54 in E. ramo- 
sissimum and 76 of 84 in E. arvense 1, were shown to 
have the potential to fold into similar stem-loop struc- 
tures with a 7-nt A-rich loop and various length stem. 
These stem-loop repeats produce a consensus mark of 



Gao et al. BMC Plant Biology 201 1, 11:64 
http://www.biomedcentral.com/1471-2229/11/64 



Page 8 of 1 3 



(a). Repeat location 
trnG 



trnE 



- 35 u 10 



trnY trnD 



psbM 



Y 

27 bp x 17 

(b). 27-bp repeats and partial sequence of trnY gene 



] 



-1 

3 
4 
5 
6 
111-1 
2 
3 
4 
5 



trnY 



AAT G AG AAAC GG AC T G T AAAT C C G T T C 
AAC GAG AAAC AG AC T C T AAAT CTGTTT 
GATAGGAGACGGACTCTAAATCCGTTC 
ATTGGGAGACGGACTCTAAATCCGTTC 
ATTGAGAGACGGACTCTAAATCCGTTC 
AT T GGG AAAC AAAC C C AG AAT C C G T T C 
GATAGGATACGGACTCTAAATCCGTTC 
GATAGAATACGGACTCTAAATCCGTTC 
GATAGAATACGGACTCTAAATCCGTTC 
GATAGGATACGGACTCTAAATCCGTTC 
GATAGGATACGGACTCTAAATCCGTTC 
AT T GGG AAAC AAAC C C AG AAT C C G T T C 
GATAGGATACGGACTCTAAATCCGTTC 
GATAGAATACGGACTCTAAATCCGTTC 
GATAGAATACGGACTCTAAATCCGTTC 
GATAGGATACGGACTCTAAATCCGTTC 
GATAGGATACGGACTCTAAATCCGTTC 
1 10 20 27 

(aatggg-gacggactgtaaatccgct) g 

21 30 40 46 

wwAnticodon.... 
>>>> Loop <<<c< 



100 bp 

(c). secondary structure of trnY-GUA 



D loop 



U G 



G 
U 



u a a 

21 



A 
G 

U ( 



g.c 
g.c 
g.c 

U.A 
C.G 
G.C 
A.u 



U 



-y c 

<S G 

G 



G 1 

C 

A 



Ti|jC loop 

A A 
A 



c 



G. 



C »GCUG* c 



C 



u 

G 



U A 



C 

•u 

A 
A 



45 G G 
C* U 

G A 

Variable loop 



c o</< 



Figure 4 The 27-bp quasi-identical repeats found in Vandenboschia radicans. 

promoter sequences predicted by BPROM [43]. 



-35" and "-10" denote conserved "-35 box" and "-10 box" 



three successive adenine nucleotides ("AAA") proximate 
to the stem (Additional file 3-4). Their total sizes are 
1,154 and 2,014 bp in E. ramosissimum and E. arvense 1 
sequence, respectively. The uncommon abundance of the 
repeats implies that they may correlate to the unexpected 
expansion of the huge YE-IGS in Equisetum. 

In order to test the correlation between the prolifera- 
tion of the stem-loop sequences and the expansion of 
YE-IGS, we composed a Perl script to ascertain the exact 
amount and the distribution of the stem-loop repeats 
(parameters: loop size = 7, stem length > 3). 90, 96 and 
102 hits representing the putative stem-loop structure 
were identified in the YE-IGS of E. ramosissimum, E. 
arvense 1 and 2 sequences, respectively. The majority of 
them, namely 68 in E. ramosissimum, 78 in E. arvense 1 
and 82 in E. arvense 2 sequence (Table 3), possess the 
sequential "AAA" immediate to the stem (Figure 5b). 
The stem lengths of these A-rich stem-loop elements 
range from 3 to 13 bp (Table 3). It is worthy to note that 
the total lengths of the repeats appropriate more than 
one quarter of the Equisetum YE-IGS, i.e. 25.72%, 28.57% 
and 28.65% in E. ramosissimum, E. arvense 1 and 2, 



respectively. In addition, the distribution of the stem- 
loop repeats is not restricted in a given small region but 
throughout the entire YE-IGS (Figure 5a). Our results 
suggest that the proliferation of the stem-loop repeats is 
directly correlated to the expansion of the YE-IGS in 
Equisetum. 

The stem-loop sequences fell into 2-4 families according 
to the first three stem base pairs proximate to the loop 
(Figure 5b). The most abundant is the GGA/TCC family, 
which may represent the prototype of the other families. 
The conserved GGA/TCC stem, 7-nt loop and "AAA" sig- 
nature push us to postulate that the stem-loop elements 
may derive from tRNA anticodon arm, because the latter 
often possess the same stem-loop characteristics. The 
complete E. arvense plastome sequence data shows that at 
least 4 tRNAs, i.e. trnC-GCA, trnF-GAA, trnL-UAA and 
trnY-GUA (Figure 5c), exhibit the GGA/UCC stem core, 
the 7-nt loop and the "AAA" signature on their anticodon 
regions. Of them, the trnY locus is exactly neighbor to the 
repeat region (Figure 5a). Occurrences of trnY-anticodon- 
arm-related repeats that are close to trnY gene have also 
been documented in Douglas-fir [21] as well as the 
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Table 2 Repeat families in the YE-IGS of Equisetum ramosissimum and E. arvense 1 sequence identified by VMATCH 
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a - Families of non-overlapping repeats sharing > 85% sequence identities; 






b - The underlined DNA strings are mutually complementary and have potential to form a paired double helix. The putative secondary structures of the repeat 


families are provided in Additional file 3-4. 






aforementioned Vandenboschia (Figure 4). Taken the 


conserved stem-loop structure detected among the Van- 


information together, 


we suggest that the trnY-GUA gene 


denboschia and Equisetum repeats suggests that the 


is possibly the origin 


of the stem-loop repeats, although 


repeats should potentially have 


a recent evolutionary ori- 


other alternatives cannot be definitively ruled out. In con- 


gin, frequent copy corrections, and certain functional 


trast to the sizes and the primary sequences, the stem- 


roles. Stem-loop structures 


have commonly been 


loop structures of the repeats appear to be highly 


observed in the plastome IGS 


regions [66,68-70]. Their 


conservative. 




loop regions are often associated with hot spots for muta- 


The "-35 box" and ' 


'-10 box" promoter sequences were 


tions, while the stem-forming sequences frequently being 


predicted upstream of trnY in Vandenboschia and Equi- 


conserved [66]. Most plastid transcripts potentially form 


setum (Figure 4a, 5a), implying that the long YE-IGS may 


stem-loops in their 5' untranslated regions (5'-UTRs) and 


function in regulating 


I the trnY transcription. The highly 


3'-UTRs [71-73], which are 


thought to function in 
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Table 3 The occurrence of putative stem-loop sequences 
with 7-nt loop and "AAA" signature in the YE-IGS of 
Equisetum ramosissimum and E. arvens e 


Stem length (base 


Equisetum E. 


arvense 


E. arvense 


pair) 


ramosissimum 


1 


2 


3 


21 


24 


28 
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12 


12 
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22 


24 
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9 


6 
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10 


2 
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2 


13 


2 


0 


0 


Total 


68 


78 


82 



mRNA maturation, accumulation, and translation 
[22,71-76]. The dramatic proliferation of stem-loop 
repeats in the Vandenboschia and Equisetum plastomes 
provides a trigger for their neofunctionalization. For 
instance, the repeats might involve in the transcriptional 



and/or post-transcriptional regulation of the neighbor 
trnY gene. 

The IGS of psbM-petN and the occurrence of ycf66 

The other highly variable IGS is located between psbM 
and petN genes (MN-IGS) (Figure 2). The longest MN- 
IGS (1,788 bp), found in Plagiogyria adnata, is about 
8 times longer than the shortest in Psilotum nudum 
(204 bp). Previous researches documented an open read- 
ing frame (ORF) designated ycf66 in the MN-IGS of 
Angiopteris evecta [28] and a pseudogenized ycf66 copy in 
both of Alsophila spinulosa [24] and Equisetum arvense 
[31]. Here we further identified a complete ycf66 in Botry- 
chium virginianum (Ophioglossaceae) and all sampled 
"non-core" leptosporangiates (Osmundales, Hymenophyl- 
lales, Gleicheniales and Schizaeales) (Figure 2). ycf66 
appears to be pseudogenized in Helminthostachys zeyla- 
nica (Ophioglossaceae), Equisetum, and tree ferns (Figure 
2). By contrast, it was undetectable in Ophioglossum vulga- 
tum (Ophioglossaceae), Psilotum, and polypods. Hence 
ycf66 may have been independently lost at least four times 
in fern lineages Ophioglossales, Psilotales, Equisetales, and 
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core leptosporangiates. Generally, the MN-IGS containing 
no ycf66 is shorter than that carrying ycf66 or its pseudo- 
gene (Figure 2). For instance, of the three Ophioglossao 
eous ferns, the MN-IGS sizes of Botrychium (1,393 bp, 
containing intact ycf66) and Helminthostachys (1,324 bp, 
containing ycf66 pseudogene) are one time longer that of 
Ophioglossum (628 bp, containing no ycf66) (Figure 2). 
The highly unstable occurrence of ycf66 suggests that it 
seems unessential for the fern plastid function, or it has 
been transferred to nuclear genome. 

Conclusions 

The tRNA-rich BZ region of fern plastomes exhibited con- 
siderable variation in size, gene order, and repeat content. 
Here a novel BZ gene order was identified in the tree fern 
Plagiogyria japonica. Our comparative analysis subse- 
quently showed that the plastomes of extant fern lineages 
may not contain the putative intermediates of BZ rearran- 
gement, pointing to the conclusion that the Adiantum 
gene order was generated by two inversions occurring in 
pairs [33]. The trnY-trnE IGS in the filmy fern Vanden- 
boschia radicans was expanded substantially due to the 
tandem 27-bp repeats resembling the anticodon domain 
of trnY. This result provided the first evidence of partial 
tRNA gene duplication in fern plastomes. In general, the 
detection of slight length variation in chloroplast IGS 
region is not uncommon [e.g. [7,10,11,20]]. Nevertheless, 
it is unprecedented that the Equisetum trnY-trnE IGSs 
were found to undergo an expansion as large as 5-kb. 
These IGS sequences were consisted of a large amount of 
stem-loop repeats, which may also have an evolutionary 
link to the trnY anticodon domain. In addition, the parallel 
losses of ycf66 in ferns were corroborated. 
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